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f^ I Abstract 

' Quantum arithmetic circuits have practical applications in various quantum algorithms. 

In this paper, we address quantum addition on 2-dimensional nearest-neighbor architectures 
^ based on the work presented by Choi and Van Meter (JETC 2012). To this end, we propose 

^Uj , new circuit structures for some basic blocks in the adder, and reduce communication overhead 

by adding concurrency to consecutive blocks and also by parallel execution of expensive Toffoli 
gates. The proposed optimizations reduce total depth from lAO^/n + fci to 92i/n + ^2 for 
I ^1 , constants fci , fc2 and affect the computation fidelity considerably. 

^ . 
Ph. 

. . 1 Introduction 

r-{ ', Quantum algorithms are often described in the quantum circuit model of computation, where for 

^H. a quantum circuit with n qubits, any pairs of qubits can interact. However, current advances in 

physical quantum technologies can only allow qubit interactions in one-, two-, or three-dimensional 

spaces. Restricting interactions to only linear dimension results in 0{n) overhead. On the other 

^ I hand, working with 2D (or 3D) quantum architectures where each qubit can interact with 4 (or 

^S) ' 6) neighboring qubits provides more flexibility. 

CO I For a given quantum circuit C one can construct an interaction graph Gc = (VcEc), the 

nodes of which represent qubits in C with edges between them when a gate in C involves the 
related qubits. Additionally, the architecture (or fabric) of a quantum computing system can be 
^r . described by a simple connected graph Gq — (Vq^Eq) where vertices Vq represent qubits and 

edges Eq represent adjacent qubit pairs that gates can be applied on \T\. Accordingly, the problem 
of mapping a quantum circuit G with arbitrary interactions between qubits onto a quantum 
architecture with limited interaction distance can be mapped to the problem of embedding graph 
Gc into graph Gq. 

In general, the graph embedding problem is NP-hard. However, optimal embedding methods 



?-H ' with polynomial time complexities for several classes of graphs have been proposed [5]. In [3], 

. .' the concept of dilation in graph embedding has been applied to find a depth lower bound for 

a quantum circuit after embedding. In this case, dilation is defined as the maximum distance 
between adjacent nodes of the graph after embedding. Working with proven properties of log- 
depth binary trees and considering the fact that log-depth quantum addition circuits exist, Choi 
and Van Meter 3 showed that the depth lower bound of the exact quantum addition circuit 
on a /c-dimensional quantum architecture is J7(-y/n). In [3], the authors examined the minimum 
overhead in depth for emulating a circuit C by a circuit C" subject to the constraints imposed by 
the interaction constraints and showed that this overhead is 0{n) for ID, 0{^/n) for 2D, 0(log n) 
or O(logn) (depending on the approach) for hypercube. 

Exploring an efficient realization of a given quantum algorithm or quantum circuit for a re- 
stricted architecture has been followed by a number of researchers during the recent years. Physical 
implementations of the quantum Fourier transform (QFT) [HIS], Shor's factorization algorithm 
[3 [HI in]; quantum error correction [TCP, and general reversible circuits jTT] for 1D/2D architectures 
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Figure 1: Decomposition of the Toffoli gate into one-qubit and six CNOT gates 
implementation with adjacent qubits. 
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have been explored in the past. Worst-case synthesis cost of a general/Boolean unitary matrix 
under the ID restriction has been discussed in [121 [T3l [Ml [15] . In [THlfTTl fT8] heuristic methods 
for converting an arbitrary quantum circuit to its equivalent circuit on ID architectures have been 
proposed. 

Quantum adder and its modular version have applications in different quantum algorithms 
including Shor's factoring algorithm. In ^19, a quantum adder with 0(-y/n) depth on 2D quantum 
architectures was proposed which has 140^/n~ 72 depth, in terms of one- and two-qubit quantum 
gates. Asymptotically, the depth of the proposed adder is optimal. However, constant-factor 
optimization is possible and in fact desirable. Besides the effect of reducing circuit size/depth 
on physical realization, any additional gate in the circuit longest path can reduce circuit fidelity 
to some extent. Based on the analysis done in ^5(T for fault-tolerant error correction with a 
concatenated 7-qubit CSS code [2Tj, nearest-neighbour communication overhead results in 175x 
reduction in error threshold. Improving error threshold is costly and may include using a more 
sophisticated quantum control protocol to have gates with higher fidelities or applying a more 
robust error correction code. Therefore, reducing unnecessary communication overhead for a useful 
quantum computation is vital. Because of the effect of addition on e.g., modular multiplication 
and modular exponentiation circuits [HI 1221 I23j , reducing communication overhead for quantum 
adder by circuit optimization — the focus of this work — is of particular interest. 

In this paper, we show how 140Y^+const depth in [TH] can be further improved to 92y^+const. 
For this purpose, we reconsider the basic blocks in the suggested quantum adder and introduce 
some constant-factor optimizations in communication overhead in different stages. To physically 
implement a given circuit, one needs to decompose all gates into primitive one- and two-qubit 
gates. To decompose a 3-qubit Toffoli (T) gate, we use Clifford-|-T gates which are universal and 
have fault-tolerant (FT) implementation [21]. Figure [T] shows the decomposition of the Toffoli gate 
into one- and two-qubit gates. To consider depth, we report circuit depth in terms of single-qubit, 
CNOT (C) and SWAP (S) gates. The rest of this paper is organized as follows. In Section [21 the 
method in [121 i^ discussed. We introduce the reduction techniques in Section |31 The result of 
the proposed reductions is analyzed in Section |4] and Section [5l We finally conclude the paper in 
Section [6l 



2 Quantum Addition on 2D Architectures 

In this section, we describe the circuit structure in [19) for quantum addition on 2D architectures. 
For an n-qubit quantum circuit, the method in [19] arranges the qubits in ^/n x ^/n arrays where 
each qubit can interact with its four neighboring qubits with no additional cost. Additionally, the 
circuit was divided into 3 phases which are executed sequentially. In the first phase, ripple-carry 
addition is performed on the first column, and carry-lookahead addition is performed on the other 
^/n — 1 columns. In the second phase, carry propagation is performed between columns, and 
finally in phase 3 carry generation and summation are performed. 

In the first phase, after using a half-adder and ^/n — 1 full-adders output carries C2, • ■ ■ c /;7^x 
will be available. It is done in 32y^— 17 unit-time steps in 19 . The carry-lookahead addition in 
other columns produces 



9k^/7i+j 

Pk^+j 



^k^+j ® f^k^+j 



(1) 

(2) 



for 1 < fc < v^n — 1 and 1 < j < ^/n. After computing gi and pi values in all columns in parallel, 
G[i,j] and P[i,j] are computed in serial based on ^ and (JH) for 1 < fc < y/n— 1, and 2 < j < %/n 



Table 1: 
(i.e., 3) 
in m. 



Basic blocks in 2D adder [19] and their depths in terms of unit-cost gates. The last term 
in total depth represents 2 NOTs and one CNOT gate used to construct the final output 



Name 


:#:stcps: gate sequence 


Circuit 


H, T, CNOT (C), SWAP (5) 


1 




Toffoli (r(a,b,0)) 


14: 2 S+ 12 1-qubit 


H(0)C(b,0)Tt(0)5(b,0)C(a,b)T(b)C(0,b) 

Tt(b)C(a,b)S(b,0)T(b)T(0)C(a,b)H(0) 

T(a)Tt(b)C(a,b) 


Half-addcr(a,b,0) 


15: 1 r+ 1 C 


r(a,b,0)r(a,b) 


Fun-addor(c,a,b,0) 


32: 2 r+ 2 C+ 2 5 


r(a,b,0)r(a,b)5(c,a)r(a,b,0)r(a,b) 
5(c,a) 


g,p(a,b,0) 


15: 1 r+ 1 C 


r(a,b,0)r(a,b) 


G,P(P,G,a,p,g,0) 


34: 2 r+ 6 5 


5(G,a)5(P,G)T(a,p,g)5(G,a)5(g,0) 
T(a,p,g)5(G,a)5(P,G)<S(G,a) 


Column.carry(P,G,C) 


18: 1 r+ 4 5 


5(P,G)r(C,G,P)5(G,C)S(P,G)5(G,C) 


Carry(P,G,a,p,C) 


18: 1 r+ 4 5 


5(P,G)<S(p,C)<S(a,p)r(a,G,P)<S(G,a) 
S(P.G) 


Garryl(p,g,c) 


16: 1 r+ 2 5 


5(g,c)r(p,g,c)5(p,g) 


SUM(c,P,a,p) 


5 : 1 C+ 4 5 


5(c,P)5(P,a)r(a,p)5(P,a)5(c,P) 


SUMl(c,a,p) 


3 : 1 C+ 2 5 


5(c,a)r(a,p)5(c,a) 


SUM2(p,c) 


1 : 1 C 


r(c,p) 


phase 1 


34v^ - 19: g,p + (v^ - 1)G,P 


phase 2 


ISy/n — IS: (^/n — 1) Column_carry 


phase 3 


18\Ar+ 1: (\Ai- 1) Carry + Carryl + SUMl 


clearing ancillac 


70yn - 39: phase 1 + phase 2 + phase 3 - SUMl 


total depth 


140\/n — 72: phase 1 + phase 2 + phase 3 + clearing ancillac + 3 



where G[ky/n + l,k^/n + 1] = g^^+i and P[ky/n + l,k^/n + 1] = p^v^+i- This part takes 
34y^— 19 time steps in [19 . Accordingly, the first phase in [19] results in SAy/n— 19 time steps. 



G[k^/n+l,ky/n + j] 
P[ky/7i+l,k^/^ + j] 



9k^+j ® Pk^+j ■ G[k^/^ + 1, k^/^ + j - I] 
Pk^+] ■ P[kVn + 1, k^/n + j - I] 



(3) 
(4) 



In the second phase, column-level carries are computed as shown in ([5]) for 1 < k < ^Jn — 1 in 
\'^\Jn — 18 time steps. 



C(fe+i)v/?r+i = G\k^/n + 1, (fc + l)^/?l] ® c^^j^x ■ P\ky/n + 1, (A; + l)Vn] 



(5) 



In phase 3 output carries are calculated sequentially as 
fn- 1,...,1. 



for 1 < fc < ^Jn — 1 and j = 



Ck^i+2+\ = G{k^ + 1, fc^n -I- j] © Cfcy^+i • P[fcVn -I- 1, fc^n -|- j] 



(6) 



Finally, addition outputs are calculated as shown in ([7]) for 1 < A; < \/n — 1 and 1 < j < \Jn. 
Altogether, operations in phase 3 can be performed in 18-\/n + 1 time steps. 



■Sfev^+j ~ O-k^+j ® ^i^^j^j Cj,^_\_j 



(7) 



Considering the three subcircuits for phase 1, phase 2, and phase 3 in sequence leads to 
TO-^n — 36 time steps in [T^]. Applying the inverse circuit to clear ancillac leads to 140^/^ — 72 
time steps for the complete adder. 

Based on the equations (HJ-©, Table [T] reports circuit depth in different blocks. In this table, 
we used the same notation in [19^ for circuit blocks — g,p to compute g^, pi values in ([1]) and ([2]); 
G,P to compute Gli^j] and P[i,j] values in ([3]) and Q; Column_carry to compute column-level 
carries in ([S]); Carry & Carryl to compute carries in (JS]); and SUM, SUMl & SUM2 to compute 
final outputs in ([7]). 

3 The Proposed 2D Adder 



In this section, we revise the basic blocks in [19^ and introduce additional parallelism in var- 
ious parts to reduce circuit depth. Basically, the proposed optimizations are based on (1) new 



circuit structures for CARRY and SUM basic bfocks (2) reducing communication overhead in Col- 
umn_carry, (3) parallel execution of expensive Toffoli gates in G,P blocks as well as in Full-adders, 
and (4) reducing interaction overhead by adding concurrency to consecutive blocks. 

3.1 New Circuits 

Working with the same circuit structures in ^W for Half-adder, g,p, and G,P blocks as reported 
in Table [TJ we define several new structures for the other blocks. 

• Full-adder: The first T and C gates in the Full-adder blocks in [H] can be executed in 
parallel with the gates in the Half-adder circuit. This saves one T and one C for all ^/n — 1 
Full-adders. 

• Column_Carry: Figure S] shows the new structure of Column_Carry block. In this circuit, 
c[k^yn+l] is from the previous column (e.g., C4 in Figure[5]). After the computation, the new 
carry, e.g., C7, is moved down, to be used by the next Column_Carry block. The previous 
carry, e.g., C4 is placed near to the Carry module. This new structure saves 1 SWAP gate. 

• Carry: Figure [5] shows the new structure for Carry block. Since c[fcy^ + 1] is required to 
compute all carries in different rows, c[k^yn + 1] is moved up in this figure. On the other 
hand, the generated carry is required to compute sum values, and hence is moved down. 
This new circuit uses 5 SWAP gates (vs. 4 in |19)). 

• SUM: Applying the proposed circuit for Carry results in adjacent c[fc^/?I + J + 1] and 
p[ky/n + j + 1] values (see Figure [5]). Based on Q sum outputs can be computed by a single 
CNOT gate. This saves 4 SWAP gates in [19^. In order to construct Si values on bi qubits, 
one needs to add one SWAP gate S{p[ky/n+ 1], c[fcv^+ 1]). However, this SWAP gate can 
be removed because of an identical SWAP gate in the Carry circuit. Accordingly, we define 
another circuit block Carryl with excluding the SWAP on c[ky/n+l] and P[ky/n+l][ky/n+j] 
(for j ~ 1) qubits. We do not need to use SUMl and SUM2 blocks in the proposed 2D adder 
structure. 



3.2 Reducing Communication Overhead 

To use adjacent gates in the 2D quantum adder, we use a set of SWAP gates inside each circuit 
block. The added SWAP gates are used for communication between those gates required for the 
computation. In other words, the added SWAP gates are not required for the computation, and 
should be reduced as much as possible. Independent optimization of different blocks can reduce 
communication overhead inside each subcircuit, but has no view about the neighboring subcircuits. 
In this section, we consider consecutive circuit blocks to reduce communication overhead further. 
Note that the optimizations given in this section are based on the new circuit blocks given in 
Section 13.11 



G,P ^ Carry: Reconsider (|3]), (|1]), and ^ and note that the result of Column_carry in 
([5]), i.e., c[k^/n + 1], is constructed on the last qubit in the Carry block (see Figure Eland 
Figure [5]). Figure [S] shows the blocks in sequence. To simplify the circuit, note that the 
last three SWAP gates in G,P can be moved to right. Next, the resulting circuit can be 
reconstructed as shown in Figure [SKb). Accordingly, three SWAP gates in each G,P block 
can be saved. Figure [7] shows the new circuits for Carry and Carryl. Note that some of G,P 
blocks are directly connected to the Carry (or Carryl) blocks without any interaction with 
Column_carry blocks. For such cases, we can apply the same mechanism. 

G,P ^ G,P: Each G,P block constructs two outputs based on Q and ([3]) where G[ky/n + 
1, fcy^ + j] depends on G[ky/n + 1, k^/n + J — 1] and P[k^Jn -f 1, k^Jn -\- j] depends on 
P[k^/ri + 1, k^/n + j ~ 1]. Since G[k^/n + 1, fcy^ -I- j] is constructed first, we can use it to 
construct G[ky/n + 1, ky/n -|- j -|- 1] in parallel to construction of P[ky/n + 1, ky/n -|- j — 1]. 
This can save one Toffoli and one SWAP. Figure [5] shows the result of this optimization. 
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Figure 2: The revised block diagram of a 2D 9-bit adder in [19 based 

on the blocks used in this paper. The critical path in this circuit is 

g,p—^G,P— ^ColCarry— ^ColCarry— ^CARRY--^CARRY1—^SUM. The C'^ block is the 
reverse of the circuit shown in the dashed box. This reverse circuit with the NOTs and CNOTs 
shown are applied to clear ancillae in [12]. Except for ColCarry (Column.carry), the number 
of inputs and outputs for other modules are the same as the ones shown in this figure. In 
Column_carry, the number of inputs/outputs is 3 — i.e., the first line and the last two lines are 
actual inputs and outputs. Note that these three lines are neighbor in the 2D layout. The qubit 
placement for this 2D grid and their values during the computation (up to clearing ancillae) are 
given in Figure [3] 



4 Depth Analysis 

In this section, we analyze the circuit depth of a 2D n-bit quantum adder based on the circuit 
structures proposed for each block. 

• Phase 1 — Half-adder+Full-adder: We can execute Half-adder and the first two gates 
(T+C) in all Full-adders in parallel. This resuhs in lT+lC+{y^ - 1)(25+1C+1T) time 
steps. 

• Phase 1 — g,p+G,P: Each g,p block includes one Toffoli gate and one CNOT gate. Except 
for the first G,P block, the other ^/ii - 2 G,P blocks include 3 SWAPs and 1 Toffoli. The 
first G,P block includes two Toffoli and two SWAP gates. Altogether, circuit depth can be 
calculated as {lT+lC)+{2T+2S)+{^ - 2){3S+1T). 



• Phase 2 

results in 



Column_carry: There are 
— l(lT+35) time steps. 



^/n — 1 Column_carry blocks in cascade. 



This 



• Phase 3 — Carry + SUM: There are ^/n — 2 Carry blocks followed by one Carryl block 
and one SUM block. Therefore, circuit depth is {^/^ - 2){lT+'lS)+{'SS+lT)+lC. 



- 


04 


ar 


- 


64, P4, S4 


67, P7, S7 


ai 


0, 94, C4 


0, 97, C7 


fal.Sl 


fts 


as 


0, C2 


f>5, P5, S5 


&8, P8, Sg 


02 


0, 95, P14,51, C4, C5 


0, 98, P17,81, C7, eg 


^2, S2 


0, G[4,51, P|4,51 


0, G[7, 8], P|7,81 


0,C3 


ae 


ag 


03 


be, Pe, se 


&9, P9, S9 


63 , sa 


0, 96, P[4,6], C4, C6 


0, 90, P[7,9], C7, eg 


0, C4 


0, G[4,6], C7 


0, G[7,9], cio 



Figure 3: 
tion. 



The qubit placement for the 2D grid in Figure [5] and their values during the computa- 
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Figure 4: (a) Circuit structure for Column_carry based on ([5]). Note that c[{k — l)y^ + 1] and 
P[{k— l)y/n+ l][ky/n] are not adjacent (see Figure[2]). (b) Circuit in (a) with adjacent gates, (c) 
Circuit in (b) with relabelled qubits to show adjacent qubits. 

Table [2] reports circuit depth for each component and the total depth in the proposed 2D 
quantum adder. As can be seen in this table, circuit depth is improved by a factor of || (i.e., 
%24). 

In [25 , a new circuit for Peres with depth=5C+3 has been proposed (Figure [TUfa)). After 
inserting one CNOT (to have Toffoli) and two SWAP gates to have adjacent gates, one can use 
the new circuit with depth=6C+25+4 in order to further optimize the proposed 2D adder. Note 
that in 25:, a circuit structure for Toffoli gate with depth=6C+2 has been proposed too, Figure 
[HI However, working with Peres gate results in a more compact circuit in terms of the number 
of SWAP gates. Following this path results in depth=92Y^+const for the proposed 2D quantum 
adder. Table [3] compares circuit depth based on different costs for Toffoli and SWAP gates. 

5 Error Correction 

To protect quantum information from errors due to e.g., noise or decoherence, quantum error 
correction (QEC) should be used in any large-scale quantum computation. In the recent years, 
various models for QEC have been proposed [21]. A common technique, known as concatenated 
quantum code, is to encode a logical qubit into the state of several physical qubits (e.g., 7 in 
Steane code and 9 in Bacon-Shor code [21], both for one level of concatenation). 

Let assume each unitary operation should be followed by quantum error correction for proper 
computation. This results in an aggressive quantum error correction mechanism. In some cir- 
cumstances, one may insert error correction after several operations, instead of each operation. 
Consider a quantum computation U with Njj logical operations which include only FT quantum 
gates. Moreover, assume that error correction for each FT gate requires Ne physical instruc- 
tions. Ne includes SWAPs required for communication. Normally, Ne differs for various logical 
operations; however, we can consider the worst-case value among all FT gates. Working with 
concatenated quantum error correction techniques, the total physical gate count at concatenation 
level L can be estimated as Nl = Nl-i + Nl-i x Ne or Nl ~ N^-i x Ne- We have A^o = Nu, and 
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Figure 5: Circuit structure for Carry based on ([5]). Inputs a[k^/n + j + 1] and p[k^/n + j + 1] are 
not used in the computation. 
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Figure 6: (a) G,P, Column_carry, and Carry blocks in cascade. The three rightmost SWAP gates 
in G,P can be merged with gates in the Carry block to construct a new circuit shown in (b). 
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Figure 7: New circuit structures for Carry (a) and Carryl (b) based on the optimization shown 
in Fig [71 Note that the first SWAP gate can be executed in parallel with gates in the previous 
block (see Figure [7]). 

therefore, Nl = Nu{Ne)^ ■ Accordingly, besides the effect of the proposed approach on circuit 
depth, one can implement the proposed 2D adder with fewer gates — the reduction factor is ||. 

6 Conclusion 

We considered a quantum adder on 2D quantum architectures. Our work is based on the results 
reported in 03 with several improvements. In particular, we optimized the building blocks of 
the 2D adder with focus on reducing the communication overhead required in 2D quantum archi- 
tectures. Having optimized consecutive blocks, the proposed adder can execute expensive Toffoli 
gates concurrently in several locations. The suggested optimizations improve depth=140y^-|- ki 
in [T^ to 92\/n + fc2 for constants fci and fc2. 
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Table 2: Circuit depth for our blocks in 2D adder. Circuit depths for CNOT (C), SWAP (5), and 
Toffoli (T) gates are considered as 1, 1, and 14 as done in jl9|. 



Block 


Circuit 


Ours 


,15 


Half-adder 


ir+ic 


15 


15 


Full-adder 


2S+1C+1T 


17 


32 


g.P 


IT+IC 


15 


15 


G,P (first) 


2r-|-2S 


30 


34 


G,P (others) 


3S+ir 


17 


34 


Column_carry 


ir+3S 


17 


18 


Carry 


ir+45 


18 


18 


Carryl 


35+ir 


17 


18 


SUM 


IC 


1 


5 


Phasel-1 


ir+ic+(\/n- i)(2<s+ic+ir) 


17v^-2 


32yn- 17 


Phasel-2 


{ir+ic)+(2{r+2S)+i^ - 2)(35+ir) 


17VH+11 


34v^- 19 


Phase2 


(v^-i)(ir-i-35) 


17\/n- 17 


18\Ai - 18 


Phases 


(\/H- 2)(ir+45) + (35+ir) + lC 


18v^- 18 


18\Ai+ 1 


elearing ancillae 


Phasel-2+Phase2+Phase3-SUM 


52y/n - 24 


70\/n - 39 


2D Adder 


Phasel-2+Phase2+Phase3+clearing ancillac+3 


104\/rr-46 


14a^/n- 72 
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Figure 8: Construction of G[ky/n + 1, ky/n + j + I] can be done in parallel to construction of 
P[k^yn + 1, k^/n + J — 1] in two consecutive G,P blocks. The right circuit shows the new circuit 
structure for G,P (except for the first G,P block). 
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Figure 9: Toffoli decomposition with depth 6C+2 [25] 
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Table 3: Circuit depth for the proposed adder and the one in [19] considering different costs for 
Toffoh and SWAP gates. 
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Figure 11: A 9- bit adder based on the proposed blocks. Carry, Gi,j, pi, and Qi values are shown in this figure. The C~^ block is the reverse of the circuit 
shown in the dashed box applied with the NOTs and CNOTs shown to clear ancillae. All gates use adjacent gates in the 2D layout. For qubit locations see 
the table in Figure |3l 



